Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't convert sparse matrix formats #282

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

adriangb
Copy link
Owner

@adriangb adriangb commented Jul 23, 2022

Situation:

  • TensorFlow doesn't handle all sparse matrix formats. Docs say it only supports sorted row major, but it seems to convert some others.
  • Scikit-Learn expects all formats to be handled or an error to be raised

Previously we were converting all matrices to lil, but as pointed out in #240 (comment) that was a pretty terrible idea.

This tries to be a little bit smarter to minimize conversion / memory use while still complying with Scikit-Learn and TFs APIs.

@adriangb
Copy link
Owner Author

@mattalhonte-srm can you give this a try please?

@adriangb adriangb changed the title fix: pass csr matrices through without conversion fix: don't convert sparse matrix formats Jul 23, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2022

Codecov Report

Merging #282 (aac68ad) into master (0144439) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #282      +/-   ##
==========================================
+ Coverage   98.28%   98.30%   +0.01%     
==========================================
  Files           7        7              
  Lines         759      765       +6     
==========================================
+ Hits          746      752       +6     
  Misses         13       13              
Impacted Files Coverage Δ
scikeras/wrappers.py 97.57% <100.00%> (+0.03%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@github-actions
Copy link

github-actions bot commented Jul 23, 2022

📝 Docs preview for commit aac68ad at: https://www.adriangb.com/scikeras/refs/pull/282/merge/

Xs_csr.sort_indices()
elif Xs.getformat() not in ("dok", "lil", "bsr"):
raise ValueError(
"TensorFlow does not support the sparse matrix format"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error does TF/Keras raise if a matrix of this format is passed?

Copy link
Owner Author

@adriangb adriangb Jul 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[2] = [37,0] is out of order. Many sparse ops require sorted indices.
E         Use `tf.sparse.reorder` to create a correctly ordered copy.
Traceback
    @traceback_utils.filter_traceback
    def fit(self,
            x=None,
            y=None,
            batch_size=None,
            epochs=1,
            verbose='auto',
            callbacks=None,
            validation_split=0.,
            validation_data=None,
            shuffle=True,
            class_weight=None,
            sample_weight=None,
            initial_epoch=0,
            steps_per_epoch=None,
            validation_steps=None,
            validation_batch_size=None,
            validation_freq=1,
            max_queue_size=10,
            workers=1,
            use_multiprocessing=False):
      """Trains the model for a fixed number of epochs (iterations on a dataset).
    
      Args:
          x: Input data. It could be:
            - A Numpy array (or array-like), or a list of arrays
              (in case the model has multiple inputs).
            - A TensorFlow tensor, or a list of tensors
              (in case the model has multiple inputs).
            - A dict mapping input names to the corresponding array/tensors,
              if the model has named inputs.
            - A `tf.data` dataset. Should return a tuple
              of either `(inputs, targets)` or
              `(inputs, targets, sample_weights)`.
            - A generator or `keras.utils.Sequence` returning `(inputs, targets)`
              or `(inputs, targets, sample_weights)`.
            - A `tf.keras.utils.experimental.DatasetCreator`, which wraps a
              callable that takes a single argument of type
              `tf.distribute.InputContext`, and returns a `tf.data.Dataset`.
              `DatasetCreator` should be used when users prefer to specify the
              per-replica batching and sharding logic for the `Dataset`.
              See `tf.keras.utils.experimental.DatasetCreator` doc for more
              information.
            A more detailed description of unpacking behavior for iterator types
            (Dataset, generator, Sequence) is given below. If using
            `tf.distribute.experimental.ParameterServerStrategy`, only
            `DatasetCreator` type is supported for `x`.
          y: Target data. Like the input data `x`,
            it could be either Numpy array(s) or TensorFlow tensor(s).
            It should be consistent with `x` (you cannot have Numpy inputs and
            tensor targets, or inversely). If `x` is a dataset, generator,
            or `keras.utils.Sequence` instance, `y` should
            not be specified (since targets will be obtained from `x`).
          batch_size: Integer or `None`.
              Number of samples per gradient update.
              If unspecified, `batch_size` will default to 32.
              Do not specify the `batch_size` if your data is in the
              form of datasets, generators, or `keras.utils.Sequence` instances
              (since they generate batches).
          epochs: Integer. Number of epochs to train the model.
              An epoch is an iteration over the entire `x` and `y`
              data provided
              (unless the `steps_per_epoch` flag is set to
              something other than None).
              Note that in conjunction with `initial_epoch`,
              `epochs` is to be understood as "final epoch".
              The model is not trained for a number of iterations
              given by `epochs`, but merely until the epoch
              of index `epochs` is reached.
          verbose: 'auto', 0, 1, or 2. Verbosity mode.
              0 = silent, 1 = progress bar, 2 = one line per epoch.
              'auto' defaults to 1 for most cases, but 2 when used with
              `ParameterServerStrategy`. Note that the progress bar is not
              particularly useful when logged to a file, so verbose=2 is
              recommended when not running interactively (eg, in a production
              environment).
          callbacks: List of `keras.callbacks.Callback` instances.
              List of callbacks to apply during training.
              See `tf.keras.callbacks`. Note `tf.keras.callbacks.ProgbarLogger`
              and `tf.keras.callbacks.History` callbacks are created automatically
              and need not be passed into `model.fit`.
              `tf.keras.callbacks.ProgbarLogger` is created or not based on
              `verbose` argument to `model.fit`.
              Callbacks with batch-level calls are currently unsupported with
              `tf.distribute.experimental.ParameterServerStrategy`, and users are
              advised to implement epoch-level calls instead with an appropriate
              `steps_per_epoch` value.
          validation_split: Float between 0 and 1.
              Fraction of the training data to be used as validation data.
              The model will set apart this fraction of the training data,
              will not train on it, and will evaluate
              the loss and any model metrics
              on this data at the end of each epoch.
              The validation data is selected from the last samples
              in the `x` and `y` data provided, before shuffling. This argument is
              not supported when `x` is a dataset, generator or
              `keras.utils.Sequence` instance.
              If both `validation_data` and `validation_split` are provided,
              `validation_data` will override `validation_split`.
              `validation_split` is not yet supported with
              `tf.distribute.experimental.ParameterServerStrategy`.
          validation_data: Data on which to evaluate
              the loss and any model metrics at the end of each epoch.
              The model will not be trained on this data. Thus, note the fact
              that the validation loss of data provided using `validation_split`
              or `validation_data` is not affected by regularization layers like
              noise and dropout.
              `validation_data` will override `validation_split`.
              `validation_data` could be:
                - A tuple `(x_val, y_val)` of Numpy arrays or tensors.
                - A tuple `(x_val, y_val, val_sample_weights)` of NumPy arrays.
                - A `tf.data.Dataset`.
                - A Python generator or `keras.utils.Sequence` returning
                `(inputs, targets)` or `(inputs, targets, sample_weights)`.
              `validation_data` is not yet supported with
              `tf.distribute.experimental.ParameterServerStrategy`.
          shuffle: Boolean (whether to shuffle the training data
              before each epoch) or str (for 'batch'). This argument is ignored
              when `x` is a generator or an object of tf.data.Dataset.
              'batch' is a special option for dealing
              with the limitations of HDF5 data; it shuffles in batch-sized
              chunks. Has no effect when `steps_per_epoch` is not `None`.
          class_weight: Optional dictionary mapping class indices (integers)
              to a weight (float) value, used for weighting the loss function
              (during training only).
              This can be useful to tell the model to
              "pay more attention" to samples from
              an under-represented class.
          sample_weight: Optional Numpy array of weights for
              the training samples, used for weighting the loss function
              (during training only). You can either pass a flat (1D)
              Numpy array with the same length as the input samples
              (1:1 mapping between weights and samples),
              or in the case of temporal data,
              you can pass a 2D array with shape
              `(samples, sequence_length)`,
              to apply a different weight to every timestep of every sample. This
              argument is not supported when `x` is a dataset, generator, or
             `keras.utils.Sequence` instance, instead provide the sample_weights
              as the third element of `x`.
          initial_epoch: Integer.
              Epoch at which to start training
              (useful for resuming a previous training run).
          steps_per_epoch: Integer or `None`.
              Total number of steps (batches of samples)
              before declaring one epoch finished and starting the
              next epoch. When training with input tensors such as
              TensorFlow data tensors, the default `None` is equal to
              the number of samples in your dataset divided by
              the batch size, or 1 if that cannot be determined. If x is a
              `tf.data` dataset, and 'steps_per_epoch'
              is None, the epoch will run until the input dataset is exhausted.
              When passing an infinitely repeating dataset, you must specify the
              `steps_per_epoch` argument. If `steps_per_epoch=-1` the training
              will run indefinitely with an infinitely repeating dataset.
              This argument is not supported with array inputs.
              When using `tf.distribute.experimental.ParameterServerStrategy`:
                * `steps_per_epoch=None` is not supported.
          validation_steps: Only relevant if `validation_data` is provided and
              is a `tf.data` dataset. Total number of steps (batches of
              samples) to draw before stopping when performing validation
              at the end of every epoch. If 'validation_steps' is None, validation
              will run until the `validation_data` dataset is exhausted. In the
              case of an infinitely repeated dataset, it will run into an
              infinite loop. If 'validation_steps' is specified and only part of
              the dataset will be consumed, the evaluation will start from the
              beginning of the dataset at each epoch. This ensures that the same
              validation samples are used every time.
          validation_batch_size: Integer or `None`.
              Number of samples per validation batch.
              If unspecified, will default to `batch_size`.
              Do not specify the `validation_batch_size` if your data is in the
              form of datasets, generators, or `keras.utils.Sequence` instances
              (since they generate batches).
          validation_freq: Only relevant if validation data is provided. Integer
              or `collections.abc.Container` instance (e.g. list, tuple, etc.).
              If an integer, specifies how many training epochs to run before a
              new validation run is performed, e.g. `validation_freq=2` runs
              validation every 2 epochs. If a Container, specifies the epochs on
              which to run validation, e.g. `validation_freq=[1, 2, 10]` runs
              validation at the end of the 1st, 2nd, and 10th epochs.
          max_queue_size: Integer. Used for generator or `keras.utils.Sequence`
              input only. Maximum size for the generator queue.
              If unspecified, `max_queue_size` will default to 10.
          workers: Integer. Used for generator or `keras.utils.Sequence` input
              only. Maximum number of processes to spin up
              when using process-based threading. If unspecified, `workers`
              will default to 1.
          use_multiprocessing: Boolean. Used for generator or
              `keras.utils.Sequence` input only. If `True`, use process-based
              threading. If unspecified, `use_multiprocessing` will default to
              `False`. Note that because this implementation relies on
              multiprocessing, you should not pass non-picklable arguments to
              the generator as they can't be passed easily to children processes.
    
      Unpacking behavior for iterator-like inputs:
          A common pattern is to pass a tf.data.Dataset, generator, or
        tf.keras.utils.Sequence to the `x` argument of fit, which will in fact
        yield not only features (x) but optionally targets (y) and sample weights.
        Keras requires that the output of such iterator-likes be unambiguous. The
        iterator should return a tuple of length 1, 2, or 3, where the optional
        second and third elements will be used for y and sample_weight
        respectively. Any other type provided will be wrapped in a length one
        tuple, effectively treating everything as 'x'. When yielding dicts, they
        should still adhere to the top-level tuple structure.
        e.g. `({"x0": x0, "x1": x1}, y)`. Keras will not attempt to separate
        features, targets, and weights from the keys of a single dict.
          A notable unsupported data type is the namedtuple. The reason is that
        it behaves like both an ordered datatype (tuple) and a mapping
        datatype (dict). So given a namedtuple of the form:
            `namedtuple("example_tuple", ["y", "x"])`
        it is ambiguous whether to reverse the order of the elements when
        interpreting the value. Even worse is a tuple of the form:
            `namedtuple("other_tuple", ["x", "y", "z"])`
        where it is unclear if the tuple was intended to be unpacked into x, y,
        and sample_weight or passed through as a single element to `x`. As a
        result the data processing code will simply raise a ValueError if it
        encounters a namedtuple. (Along with instructions to remedy the issue.)
    
      Returns:
          A `History` object. Its `History.history` attribute is
          a record of training loss values and metrics values
          at successive epochs, as well as validation loss values
          and validation metrics values (if applicable).
    
      Raises:
          RuntimeError: 1. If the model was never compiled or,
          2. If `model.fit` is  wrapped in `tf.function`.
    
          ValueError: In case of mismatch between the provided input data
              and what the model expects or when the input data is empty.
      """
      base_layer.keras_api_gauge.get_cell('fit').set(True)
      # Legacy graph support is contained in `training_v1.Model`.
      version_utils.disallow_legacy_graph('Model', 'fit')
      self._assert_compile_was_called()
      self._check_call_args('fit')
      _disallow_inside_tf_function('fit')
    
      verbose = _get_verbosity(verbose, self.distribute_strategy)
    
      if validation_split and validation_data is None:
        # Create the validation data using the training data. Only supported for
        # `Tensor` and `NumPy` input.
        (x, y, sample_weight), validation_data = (
            data_adapter.train_validation_split(
                (x, y, sample_weight), validation_split=validation_split))
    
      if validation_data:
        val_x, val_y, val_sample_weight = (
            data_adapter.unpack_x_y_sample_weight(validation_data))
    
      if self.distribute_strategy._should_use_with_coordinator:  # pylint: disable=protected-access
        self._cluster_coordinator = tf.distribute.experimental.coordinator.ClusterCoordinator(
            self.distribute_strategy)
    
      with self.distribute_strategy.scope(), \
           training_utils.RespectCompiledTrainableState(self):
        # Creates a `tf.data.Dataset` and handles batch and epoch iteration.
>       data_handler = data_adapter.get_data_handler(
            x=x,
            y=y,
            sample_weight=sample_weight,
            batch_size=batch_size,
            steps_per_epoch=steps_per_epoch,
            initial_epoch=initial_epoch,
            epochs=epochs,
            shuffle=shuffle,
            class_weight=class_weight,
            max_queue_size=max_queue_size,
            workers=workers,
            use_multiprocessing=use_multiprocessing,
            model=self,
            steps_per_execution=self._steps_per_execution)

.venv/lib/python3.10/site-packages/keras/engine/training.py:1358: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = ()
kwargs = {'batch_size': 1000, 'class_weight': None, 'epochs': 1, 'initial_epoch': 0, ...}

    def get_data_handler(*args, **kwargs):
      if getattr(kwargs["model"], "_cluster_coordinator", None):
        return _ClusterCoordinatorDataHandler(*args, **kwargs)
>     return DataHandler(*args, **kwargs)

.venv/lib/python3.10/site-packages/keras/engine/data_adapter.py:1401: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <keras.engine.data_adapter.DataHandler object at 0x7f3cb866c700>
x = <40x3 sparse matrix of type '<class 'numpy.float32'>'
	with 53 stored elements (18 diagonals) in DIAgonal format>
y = array([[2.],
       [2.],
       [3.],
       [2.],
       [1.],
       [2.],
       [0.],
       [1.],
       [2.],
 ...       [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)
sample_weight = None, batch_size = 1000, steps_per_epoch = None
initial_epoch = 0, epochs = 1, shuffle = True, class_weight = None
max_queue_size = 10, workers = 1, use_multiprocessing = False
model = <keras.engine.functional.Functional object at 0x7f3cb8659cf0>
steps_per_execution = <tf.Variable 'Variable:0' shape=() dtype=int64, numpy=1>
distribute = True

    def __init__(self,
                 x,
                 y=None,
                 sample_weight=None,
                 batch_size=None,
                 steps_per_epoch=None,
                 initial_epoch=0,
                 epochs=1,
                 shuffle=False,
                 class_weight=None,
                 max_queue_size=10,
                 workers=1,
                 use_multiprocessing=False,
                 model=None,
                 steps_per_execution=None,
                 distribute=True):
      """Initializes a `DataHandler`.
    
      Arguments:
        x: See `Model.fit`.
        y: See `Model.fit`.
        sample_weight: See `Model.fit`.
        batch_size: See `Model.fit`.
        steps_per_epoch: See `Model.fit`.
        initial_epoch: See `Model.fit`.
        epochs: See `Model.fit`.
        shuffle: See `Model.fit`.
        class_weight: See `Model.fit`.
        max_queue_size: See `Model.fit`.
        workers: See `Model.fit`.
        use_multiprocessing: See `Model.fit`.
        model: The `Model` instance. Needed in order to correctly `build` the
          `Model` using generator-like inputs (see `GeneratorDataAdapter`).
        steps_per_execution: See `Model.compile`.
        distribute: Whether to distribute the `tf.dataset`.
          `PreprocessingLayer.adapt` does not support distributed datasets,
          `Model` should always set this to `True`.
      """
    
      self._initial_epoch = initial_epoch
      self._initial_step = 0
      self._epochs = epochs
      self._insufficient_data = False
      self._model = model
    
      # `steps_per_execution_value` is the cached initial value.
      # `steps_per_execution` is mutable and may be changed by the DataAdapter
      # to handle partial executions.
      if steps_per_execution is None:
        self._steps_per_execution = tf.Variable(1)
      else:
        self._steps_per_execution = steps_per_execution
    
      adapter_cls = select_data_adapter(x, y)
>     self._adapter = adapter_cls(
          x,
          y,
          batch_size=batch_size,
          steps=steps_per_epoch,
          epochs=epochs - initial_epoch,
          sample_weights=sample_weight,
          shuffle=shuffle,
          max_queue_size=max_queue_size,
          workers=workers,
          use_multiprocessing=use_multiprocessing,
          distribution_strategy=tf.distribute.get_strategy(),
          model=model)

.venv/lib/python3.10/site-packages/keras/engine/data_adapter.py:1151: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <keras.engine.data_adapter.CompositeTensorDataAdapter object at 0x7f3cb8659b10>
x = <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cd4112770>
y = <tf.Tensor: shape=(40, 1), dtype=float32, numpy=
array([[2.],
       [2.],
       [3.],
       [2.],
       [1.],
    ...      [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>
sample_weights = None, sample_weight_modes = None, batch_size = 1000
steps = None, shuffle = True
kwargs = {'distribution_strategy': <tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7f3cd8441390>, 'epochs': 1, 'max_queue_size': 10, 'model': <keras.engine.functional.Functional object at 0x7f3cb8659cf0>, ...}
_ = False
inputs = (<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cd4112770>, <tf.Tensor: shape=(40, 1), dtype=f...     [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>)

    def __init__(self,
                 x,
                 y=None,
                 sample_weights=None,
                 sample_weight_modes=None,
                 batch_size=None,
                 steps=None,
                 shuffle=False,
                 **kwargs):
      super(CompositeTensorDataAdapter, self).__init__(x, y, **kwargs)
      x, y, sample_weights = _process_tensorlike((x, y, sample_weights))
      sample_weight_modes = broadcast_sample_weight_modes(
          sample_weights, sample_weight_modes)
    
      # If sample_weights are not specified for an output use 1.0 as weights.
      (sample_weights, _, _) = training_utils.handle_partial_sample_weights(
          y, sample_weights, sample_weight_modes, check_all_flat=True)
    
      inputs = pack_x_y_sample_weight(x, y, sample_weights)
    
>     dataset = tf.data.Dataset.from_tensor_slices(inputs)

.venv/lib/python3.10/site-packages/keras/engine/data_adapter.py:587: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tensors = (<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cd4112770>, <tf.Tensor: shape=(40, 1), dtype=f...     [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>)
name = None

    @staticmethod
    def from_tensor_slices(tensors, name=None):
      """Creates a `Dataset` whose elements are slices of the given tensors.
    
      The given tensors are sliced along their first dimension. This operation
      preserves the structure of the input tensors, removing the first dimension
      of each tensor and using it as the dataset dimension. All input tensors
      must have the same size in their first dimensions.
    
      >>> # Slicing a 1D tensor produces scalar tensor elements.
      >>> dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3])
      >>> list(dataset.as_numpy_iterator())
      [1, 2, 3]
    
      >>> # Slicing a 2D tensor produces 1D tensor elements.
      >>> dataset = tf.data.Dataset.from_tensor_slices([[1, 2], [3, 4]])
      >>> list(dataset.as_numpy_iterator())
      [array([1, 2], dtype=int32), array([3, 4], dtype=int32)]
    
      >>> # Slicing a tuple of 1D tensors produces tuple elements containing
      >>> # scalar tensors.
      >>> dataset = tf.data.Dataset.from_tensor_slices(([1, 2], [3, 4], [5, 6]))
      >>> list(dataset.as_numpy_iterator())
      [(1, 3, 5), (2, 4, 6)]
    
      >>> # Dictionary structure is also preserved.
      >>> dataset = tf.data.Dataset.from_tensor_slices({"a": [1, 2], "b": [3, 4]})
      >>> list(dataset.as_numpy_iterator()) == [{'a': 1, 'b': 3},
      ...                                       {'a': 2, 'b': 4}]
      True
    
      >>> # Two tensors can be combined into one Dataset object.
      >>> features = tf.constant([[1, 3], [2, 1], [3, 3]]) # ==> 3x2 tensor
      >>> labels = tf.constant(['A', 'B', 'A']) # ==> 3x1 tensor
      >>> dataset = Dataset.from_tensor_slices((features, labels))
      >>> # Both the features and the labels tensors can be converted
      >>> # to a Dataset object separately and combined after.
      >>> features_dataset = Dataset.from_tensor_slices(features)
      >>> labels_dataset = Dataset.from_tensor_slices(labels)
      >>> dataset = Dataset.zip((features_dataset, labels_dataset))
      >>> # A batched feature and label set can be converted to a Dataset
      >>> # in similar fashion.
      >>> batched_features = tf.constant([[[1, 3], [2, 3]],
      ...                                 [[2, 1], [1, 2]],
      ...                                 [[3, 3], [3, 2]]], shape=(3, 2, 2))
      >>> batched_labels = tf.constant([['A', 'A'],
      ...                               ['B', 'B'],
      ...                               ['A', 'B']], shape=(3, 2, 1))
      >>> dataset = Dataset.from_tensor_slices((batched_features, batched_labels))
      >>> for element in dataset.as_numpy_iterator():
      ...   print(element)
      (array([[1, 3],
             [2, 3]], dtype=int32), array([[b'A'],
             [b'A']], dtype=object))
      (array([[2, 1],
             [1, 2]], dtype=int32), array([[b'B'],
             [b'B']], dtype=object))
      (array([[3, 3],
             [3, 2]], dtype=int32), array([[b'A'],
             [b'B']], dtype=object))
    
      Note that if `tensors` contains a NumPy array, and eager execution is not
      enabled, the values will be embedded in the graph as one or more
      `tf.constant` operations. For large datasets (> 1 GB), this can waste
      memory and run into byte limits of graph serialization. If `tensors`
      contains one or more large NumPy arrays, consider the alternative described
      in [this guide](
      https://tensorflow.org/guide/data#consuming_numpy_arrays).
    
      Args:
        tensors: A dataset element, whose components have the same first
          dimension. Supported values are documented
          [here](https://www.tensorflow.org/guide/data#dataset_structure).
        name: (Optional.) A name for the tf.data operation.
    
      Returns:
        Dataset: A `Dataset`.
      """
>     return TensorSliceDataset(tensors, name=name)

.venv/lib/python3.10/site-packages/tensorflow/python/data/ops/dataset_ops.py:809: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("'TensorSliceDataset' object has no attribute '_structure'") raised in repr()] TensorSliceDataset object at 0x7f3cb865ad70>
element = (<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>, <tf.Tensor: shape=(40, 1), dtype=f...     [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>)
is_files = False, name = None

    def __init__(self, element, is_files=False, name=None):
      """See `Dataset.from_tensor_slices()` for details."""
      element = structure.normalize_element(element)
      batched_spec = structure.type_spec_from_value(element)
>     self._tensors = structure.to_batched_tensor_list(batched_spec, element)

.venv/lib/python3.10/site-packages/tensorflow/python/data/ops/dataset_ops.py:4553: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

element_spec = (SparseTensorSpec(TensorShape([40, 3]), tf.float32), TensorSpec(shape=(40, 1), dtype=tf.float32, name=None))
element = (<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>, <tf.Tensor: shape=(40, 1), dtype=f...     [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>)

    def to_batched_tensor_list(element_spec, element):
      """Returns a tensor list representation of the element.
    
      Args:
        element_spec: A nested structure of `tf.TypeSpec` objects representing to
          element type specification.
        element: The element to convert to tensor list representation.
    
      Returns:
        A tensor list representation of `element`.
    
      Raises:
        ValueError: If `element_spec` and `element` do not have the same number of
          elements or if the two structures are not nested in the same way or the
          rank of any of the tensors in the tensor list representation is 0.
        TypeError: If `element_spec` and `element` differ in the type of sequence
          in any of their substructures.
      """
    
      # pylint: disable=protected-access
      # pylint: disable=g-long-lambda
>     return _to_tensor_list_helper(
          lambda state, spec, component: state + spec._to_batched_tensor_list(
              component), element_spec, element)

.venv/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py:363: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

encode_fn = <function to_batched_tensor_list.<locals>.<lambda> at 0x7f3cd4177880>
element_spec = (SparseTensorSpec(TensorShape([40, 3]), tf.float32), TensorSpec(shape=(40, 1), dtype=tf.float32, name=None))
element = (<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>, <tf.Tensor: shape=(40, 1), dtype=f...     [0.],
       [2.],
       [0.],
       [3.],
       [3.],
       [2.],
       [1.],
       [0.]], dtype=float32)>)

    def _to_tensor_list_helper(encode_fn, element_spec, element):
      """Returns a tensor list representation of the element.
    
      Args:
        encode_fn: Method that constructs a tensor list representation from the
          given element spec and element.
        element_spec: A nested structure of `tf.TypeSpec` objects representing to
          element type specification.
        element: The element to convert to tensor list representation.
    
      Returns:
        A tensor list representation of `element`.
    
      Raises:
        ValueError: If `element_spec` and `element` do not have the same number of
          elements or if the two structures are not nested in the same way.
        TypeError: If `element_spec` and `element` differ in the type of sequence
          in any of their substructures.
      """
    
      nest.assert_same_structure(element_spec, element)
    
      def reduce_fn(state, value):
        spec, component = value
        return encode_fn(state, spec, component)
    
>     return functools.reduce(
          reduce_fn, zip(nest.flatten(element_spec), nest.flatten(element)), [])

.venv/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py:338: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

state = []
value = (SparseTensorSpec(TensorShape([40, 3]), tf.float32), <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>)

    def reduce_fn(state, value):
      spec, component = value
>     return encode_fn(state, spec, component)

.venv/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py:336: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

state = [], spec = SparseTensorSpec(TensorShape([40, 3]), tf.float32)
component = <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>

>   lambda state, spec, component: state + spec._to_batched_tensor_list(
        component), element_spec, element)

.venv/lib/python3.10/site-packages/tensorflow/python/data/util/structure.py:364: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = SparseTensorSpec(TensorShape([40, 3]), tf.float32)
value = <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f3cb866dbd0>

    def _to_batched_tensor_list(self, value):
      dense_shape = tensor_util.constant_value_as_shape(value.dense_shape)
      if self._shape.merge_with(dense_shape).ndims == 0:
        raise ValueError(
            "Unbatching a sparse tensor is only supported for rank >= 1. "
            f"Obtained input: {value}.")
>     return [gen_sparse_ops.serialize_many_sparse(
          value.indices, value.values, value.dense_shape,
          out_type=dtypes.variant)]

.venv/lib/python3.10/site-packages/tensorflow/python/framework/sparse_tensor.py:368: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

sparse_indices = <tf.Tensor: shape=(21, 2), dtype=int64, numpy=
array([[38,  0],
       [39,  1],
       [37,  0],
       [38,  2],
   ...      [ 9,  0],
       [ 6,  1],
       [ 6,  2],
       [ 4,  1],
       [ 5,  2],
       [ 2,  1],
       [ 2,  2]])>
sparse_values = <tf.Tensor: shape=(21,), dtype=float32, numpy=
array([0.81379783, 0.8817354 , 0.84640867, 0.8811032 , 0.952749  ,
    ...806, 0.9446689 ,
       0.87001216, 0.9786183 , 0.92559665, 0.83261985, 0.891773  ,
       0.96366274], dtype=float32)>
sparse_shape = <tf.Tensor: shape=(2,), dtype=int64, numpy=array([40,  3])>
out_type = tf.variant, name = None

    def serialize_many_sparse(sparse_indices, sparse_values, sparse_shape, out_type=_dtypes.string, name=None):
      r"""Serialize an `N`-minibatch `SparseTensor` into an `[N, 3]` `Tensor` object.
    
      The `SparseTensor` must have rank `R` greater than 1, and the first dimension
      is treated as the minibatch dimension.  Elements of the `SparseTensor`
      must be sorted in increasing order of this first dimension.  The serialized
      `SparseTensor` objects going into each row of `serialized_sparse` will have
      rank `R-1`.
    
      The minibatch size `N` is extracted from `sparse_shape[0]`.
    
      Args:
        sparse_indices: A `Tensor` of type `int64`.
          2-D.  The `indices` of the minibatch `SparseTensor`.
        sparse_values: A `Tensor`.
          1-D.  The `values` of the minibatch `SparseTensor`.
        sparse_shape: A `Tensor` of type `int64`.
          1-D.  The `shape` of the minibatch `SparseTensor`.
        out_type: An optional `tf.DType` from: `tf.string, tf.variant`. Defaults to `tf.string`.
          The `dtype` to use for serialization; the supported types are `string`
          (default) and `variant`.
        name: A name for the operation (optional).
    
      Returns:
        A `Tensor` of type `out_type`.
      """
      _ctx = _context._context or _context.context()
      tld = _ctx._thread_local_data
      if tld.is_eager:
        try:
          _result = pywrap_tfe.TFE_Py_FastPathExecute(
            _ctx, "SerializeManySparse", name, sparse_indices, sparse_values,
            sparse_shape, "out_type", out_type)
          return _result
        except _core._NotOkStatusException as e:
>         _ops.raise_from_not_ok_status(e, name)

.venv/lib/python3.10/site-packages/tensorflow/python/ops/gen_sparse_ops.py:496: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

e = _NotOkStatusException(), name = None

    def raise_from_not_ok_status(e, name):
      e.message += (" name: " + name if name is not None else "")
>     raise core._status_to_exception(e) from None  # pylint: disable=protected-access
E     tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[2] = [37,0] is out of order. Many sparse ops require sorted indices.
E         Use `tf.sparse.reorder` to create a correctly ordered copy.
E     
E      [Op:SerializeManySparse]

.venv/lib/python3.10/site-packages/tensorflow/python/framework/ops.py:7164: InvalidArgumentError

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can think of a better way to handle this compatibility, I am all ears

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave conversion to the correct format (spmatrix with sorted indices) to the user, and raise a ValueError if not correctly configured?

Why would the user want you to handle the conversion? Can you check if the index is already sorted to avoid some overhead? (or do you do that already? how much overhead is added? is "y" sorted too?).

(I haven't reviewed your code closely; 'scuse my ignorant questions)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort_indices() already checks if it is already sorted: https://github.com/scipy/scipy/blob/4cf21e753cf937d1c6c2d2a0e372fbc1dbbeea81/scipy/sparse/_compressed.py#L1163

I guess we could just not check here and let TensorFlow fail? I don't Scikit-Learn even checks for this specific case (it just checks for formats)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  1. Raise errors before any fitting is done. This will require tests to ensure that SciKeras is staying current with TF's implementation (and that DOK/LIL/BSR raise errors in TF and SciKeras).
  2. Let TF raise errors with sparse matrices. Then maybe look at the error message and throw a warning/another error, and test to make sure that some sparse matrices work.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, and currently (1) is what would be happening.

There are two sorts of changes that I could see happening with (1):

  1. TensorFlow supports a new sparse input format. We won't be able to catch this with tests, but presumably if a user wants to use this they can ask us for the feature and it would be pretty straightforward to implement. I also don't think this is super likely to happen.
  2. TensorFlow drops support for a format. This we would catch with tests. Very unlikely to happen given TensorFlow's backward compatibility guarantees.

The two main issues with option (2) are:

  1. The Scikit-Learn API explicitly calls for a ValueError (so technically we would be violating the Scikit-Learn API).
  2. Introspecting into error messages is a bit problematic, they can change at any time, etc.
  3. The errors that get raised are not very user friendly.

@mattalhonte-srm
Copy link

Heya! Tried this branch, blows up my container. Would it be possible to just have an accept_sparse flag that lets the sparse matrix pass without anything being done do it? Maybe with a warning that it can cause errors?

@adriangb
Copy link
Owner Author

That's surprising. What happens if you comment out

Xs_csr.sort_indices()

@mattalhonte-srm
Copy link

Just tried! No luck, still blew up the container. Naively passing the CSR matrix and letting it pass unmolested is the only thing that's worked - some sort of "manual override" flag that just lets you do that would be ideal I think.

@adriangb
Copy link
Owner Author

I'm a bit at a loss. After removing that line, there should be no processing being done. Is your branch that you said is working for you public? I would love to take a look.

@mattalhonte-srm
Copy link

It's not, but here's the monkeypatch that made it work:

"""Wrapper for using the Scikit-Learn API with Keras models.
"""
import inspect
import warnings

from collections import defaultdict
from typing import Any, Callable, Dict, Iterable, List, Mapping, Set, Tuple, Type, Union

import numpy as np
import tensorflow as tf

from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin
from sklearn.exceptions import NotFittedError
from sklearn.metrics import accuracy_score as sklearn_accuracy_score
from sklearn.metrics import r2_score as sklearn_r2_score
from sklearn.preprocessing import FunctionTransformer
from sklearn.utils.class_weight import compute_sample_weight
from sklearn.utils.multiclass import type_of_target
from sklearn.utils.validation import _check_sample_weight, check_array, check_X_y
from tensorflow.keras import losses as losses_module
from tensorflow.keras.models import Model
from tensorflow.keras.utils import register_keras_serializable

from scikeras._utils import (
    accepts_kwargs,
    get_loss_class_function_or_string,
    get_metric_class,
    get_optimizer_class,
    has_param,
    route_params,
    try_to_convert_strings_to_classes,
    unflatten_params,
)
from scikeras.utils import loss_name, metric_name
from scikeras.utils.random_state import tensorflow_random_state
from scikeras.utils.transformers import ClassifierLabelEncoder, RegressorTargetEncoder


class BaseWrapper(BaseEstimator):
    """Implementation of the scikit-learn classifier API for Keras.

    Below are a list of SciKeras specific parameters. For details on other parameters,
    please see the see the `tf.keras.Model documentation <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`_.

    Parameters
    ----------
    model : Union[None, Callable[..., tf.keras.Model], tf.keras.Model], default None
        Used to build the Keras Model. When called,
        must return a compiled instance of a Keras Model
        to be used by `fit`, `predict`, etc.
        If None, you must implement ``_keras_build_fn``.
    optimizer : Union[str, tf.keras.optimizers.Optimizer, Type[tf.keras.optimizers.Optimizer]], default "rmsprop"
        This can be a string for Keras' built in optimizers,
        an instance of tf.keras.optimizers.Optimizer
        or a class inheriting from tf.keras.optimizers.Optimizer.
        Only strings and classes support parameter routing.
    loss : Union[Union[str, tf.keras.losses.Loss, Type[tf.keras.losses.Loss], Callable], None], default None
        The loss function to use for training.
        This can be a string for Keras' built in losses,
        an instance of tf.keras.losses.Loss
        or a class inheriting from tf.keras.losses.Loss .
        Only strings and classes support parameter routing.
    random_state : Union[int, np.random.RandomState, None], default None
        Set the Tensorflow random number generators to a
        reproducible deterministic state using this seed.
        Pass an int for reproducible results across multiple
        function calls.
    warm_start : bool, default False
        If True, subsequent calls to fit will _not_ reset
        the model parameters but *will* reset the epoch to zero.
        If False, subsequent fit calls will reset the entire model.
        This has no impact on partial_fit, which always trains
        for a single epoch starting from the current epoch.
    batch_size : Union[int, None], default None
        Number of samples per gradient update.
        This will be applied to both `fit` and `predict`. To specify different numbers,
        pass `fit__batch_size=32` and `predict__batch_size=1000` (for example).
        To auto-adjust the batch size to use all samples, pass `batch_size=-1`.

    Attributes
    ----------
    model_ : tf.keras.Model
        The instantiated and compiled Keras Model. For pre-built models, this
        will just be a reference to the passed Model instance.
    history_ : Dict[str, List[Any]]
        Dictionary of the format ``{metric_str_name: [epoch_0_data, epoch_1_data, ..., epoch_n_data]}``.
    initialized_ : bool
        True if this estimator has been initialized (i.e. predict can be called upon it).
        Note that this does not guarantee that the model is "fitted": if ``BaseWrapper.initialize``
        was called instead of fit the model wil likely have random weights.
    target_encoder_ : sklearn-transformer
        Transformer used to pre/post process the target y.
    feature_encoder_ : sklearn-transformer
        Transformer used to pre/post process the features/input X.
    n_outputs_expected_ : int
        The number of outputs the Keras Model is expected to have, as determined by ``target_transformer_``.
    target_type_ : str
        One of:

        * 'continuous': y is an array-like of floats that are not all
          integers, and is 1d or a column vector.
        * 'continuous-multioutput': y is a 2d array of floats that are
          not all integers, and both dimensions are of size > 1.
        * 'binary': y contains <= 2 discrete values and is 1d or a column
          vector.
        * 'multiclass': y contains more than two discrete values, is not a
          sequence of sequences, and is 1d or a column vector.
        * 'multiclass-multioutput': y is a 2d array that contains more
          than two discrete values, is not a sequence of sequences, and both
          dimensions are of size > 1.
        * 'multilabel-indicator': y is a label indicator matrix, an array
          of two dimensions with at least two columns, and at most 2 unique
          values.
        * 'unknown': y is array-like but none of the above, such as a 3d
          array, sequence of sequences, or an array of non-sequence objects.
    y_shape_ : Tuple[int]
        Shape of the target y that the estimator was fitted on.
    y_dtype_ : np.dtype
        Dtype of the target y that the estimator was fitted on.
    X_shape_ : Tuple[int]
        Shape of the input X that the estimator was fitted on.
    X_dtype_ : np.dtype
        Dtype of the input X that the estimator was fitted on.
    n_features_in_ : int
        The number of features seen during `fit`.
    """

    _tags = {
        "poor_score": True,
        "multioutput": True,
    }

    _fit_kwargs = {
        # parameters destined to keras.Model.fit
        "batch_size",
        "epochs",
        "verbose",
        "validation_split",
        "shuffle",
        "class_weight",
        "sample_weight",
        "initial_epoch",
        "validation_steps",
        "validation_batch_size",
        "validation_freq",
    }

    _predict_kwargs = {
        # parameters destined to keras.Model.predict
        "batch_size",
        "verbose",
        "steps",
    }

    _compile_kwargs = {
        # parameters destined to keras.Model.compile
        "optimizer",
        "loss",
        "metrics",
        "loss_weights",
        "weighted_metrics",
        "run_eagerly",
    }

    _wrapper_params = {
        # parameters consumed by the wrappers themselves
        "warm_start",
        "random_state",
    }

    _routing_prefixes = {
        "model",
        "fit",
        "compile",
        "predict",
        "optimizer",
        "loss",
        "metrics",
    }

    def __init__(
        self,
        model: Union[None, Callable[..., tf.keras.Model], tf.keras.Model] = None,
        *,
        build_fn: Union[
            None, Callable[..., tf.keras.Model], tf.keras.Model
        ] = None,  # for backwards compatibility
        warm_start: bool = False,
        random_state: Union[int, np.random.RandomState, None] = None,
        optimizer: Union[
            str, tf.keras.optimizers.Optimizer, Type[tf.keras.optimizers.Optimizer]
        ] = "rmsprop",
        loss: Union[
            Union[str, tf.keras.losses.Loss, Type[tf.keras.losses.Loss], Callable], None
        ] = None,
        metrics: Union[
            List[
                Union[
                    str,
                    tf.keras.metrics.Metric,
                    Type[tf.keras.metrics.Metric],
                    Callable,
                ]
            ],
            None,
        ] = None,
        batch_size: Union[int, None] = None,
        validation_batch_size: Union[int, None] = None,
        verbose: int = 1,
        callbacks: Union[
            List[Union[tf.keras.callbacks.Callback, Type[tf.keras.callbacks.Callback]]],
            None,
        ] = None,
        validation_split: float = 0.0,
        shuffle: bool = True,
        run_eagerly: bool = False,
        epochs: int = 1,
        **kwargs,
    ):

        # Parse hardcoded params
        self.model = model
        self.build_fn = build_fn
        self.warm_start = warm_start
        self.random_state = random_state
        self.optimizer = optimizer
        self.loss = loss
        self.metrics = metrics
        self.batch_size = batch_size
        self.validation_batch_size = validation_batch_size
        self.verbose = verbose
        self.callbacks = callbacks
        self.validation_split = validation_split
        self.shuffle = shuffle
        self.run_eagerly = run_eagerly
        self.epochs = epochs

        # Unpack kwargs
        vars(self).update(**kwargs)

        # Save names of kwargs into set
        if kwargs:
            self._user_params = set(kwargs)

    @property
    def __name__(self):
        return "KerasClassifier"

    @property
    def current_epoch(self) -> int:
        """Returns the current training epoch.

        Returns
        -------
        int
            Current training epoch.
        """
        if not hasattr(self, "history_"):
            return 0
        return len(self.history_["loss"])

    @staticmethod
    def _validate_sample_weight(
        X: np.ndarray,
        y: np.ndarray,
        sample_weight: Union[np.ndarray, Iterable],
    ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
        """Validate that the passed sample_weight and ensure it is a Numpy array."""
        sample_weight = _check_sample_weight(
            sample_weight, X, dtype=np.dtype(tf.keras.backend.floatx())
        )
        if np.all(sample_weight == 0):
            raise ValueError(
                "No training samples had any weight; only zeros were passed in sample_weight."
                " That means there's nothing to train on by definition, so training can not be completed."
            )
        # drop any zero sample weights
        # this helps mirror the behavior of sklearn estimators
        # which tend to have higher precisions
        not_dropped_samples = sample_weight != 0
        return (
            X[not_dropped_samples, ...],
            y[not_dropped_samples, ...],
            sample_weight[not_dropped_samples, ...],
        )

    def _check_model_param(self):
        """Checks ``model`` and returns model building
        function to use.

        Raises
        ------
            ValueError: if ``self.model`` is not valid.
        """
        model = self.model
        build_fn = self.build_fn
        if model is None and build_fn is not None:
            model = build_fn
            warnings.warn(
                "``build_fn`` will be renamed to ``model`` in a future release,"
                " at which point use of ``build_fn`` will raise an Error instead."
            )
        if model is None:
            # no model, use this class' _keras_build_fn
            if not hasattr(self, "_keras_build_fn"):
                raise ValueError(
                    "If not using the ``build_fn`` param, "
                    "you must implement ``_keras_build_fn``"
                )
            final_build_fn = self._keras_build_fn
        elif isinstance(model, Model):
            # pre-built Keras Model
            def final_build_fn():
                return model

        elif inspect.isfunction(model):
            if hasattr(self, "_keras_build_fn"):
                raise ValueError(
                    "This class cannot implement ``_keras_build_fn`` if"
                    " using the `model` parameter"
                )
            # a callable method/function
            final_build_fn = model
        else:
            raise TypeError(
                "``model`` must be a callable, a Keras Model instance or None"
            )

        return final_build_fn

    def _get_compile_kwargs(self):
        """Convert all __init__ params destined to
        `compile` into valid kwargs for `Model.compile` by parsing
        routed parameters and compiling optimizers, losses and metrics
        as needed.

        Returns
        -------
        dict
            Dictionary of kwargs for `Model.compile`.
        """
        init_params = self.get_params()
        compile_kwargs = route_params(
            init_params,
            destination="compile",
            pass_filter=self._compile_kwargs,
        )
        compile_kwargs["optimizer"] = try_to_convert_strings_to_classes(
            compile_kwargs["optimizer"], get_optimizer_class
        )
        compile_kwargs["optimizer"] = unflatten_params(
            items=compile_kwargs["optimizer"],
            params=route_params(
                init_params,
                destination="optimizer",
                pass_filter=set(),
                strict=True,
            ),
        )
        compile_kwargs["loss"] = try_to_convert_strings_to_classes(
            compile_kwargs["loss"], get_loss_class_function_or_string
        )
        compile_kwargs["loss"] = unflatten_params(
            items=compile_kwargs["loss"],
            params=route_params(
                init_params,
                destination="loss",
                pass_filter=set(),
                strict=False,
            ),
        )
        compile_kwargs["metrics"] = try_to_convert_strings_to_classes(
            compile_kwargs["metrics"], get_metric_class
        )
        compile_kwargs["metrics"] = unflatten_params(
            items=compile_kwargs["metrics"],
            params=route_params(
                init_params,
                destination="metrics",
                pass_filter=set(),
                strict=False,
            ),
        )
        return compile_kwargs

    def _build_keras_model(self):
        """Build the Keras model.

        This method will process all arguments and call the model building
        function with appropriate arguments.

        Returns
        -------
        tensorflow.keras.Model
            Instantiated and compiled keras Model.
        """
        # dynamically build model, i.e. final_build_fn builds a Keras model

        # determine what type of build_fn to use
        final_build_fn = self._check_model_param()

        # collect parameters
        params = self.get_params()
        build_params = route_params(
            params,
            destination="model",
            pass_filter=getattr(self, "_user_params", set()),
            strict=True,
        )
        compile_kwargs = None
        if has_param(final_build_fn, "meta") or accepts_kwargs(final_build_fn):
            # build_fn accepts `meta`, add it
            build_params["meta"] = self._get_metadata()
        if has_param(final_build_fn, "compile_kwargs") or accepts_kwargs(
            final_build_fn
        ):
            # build_fn accepts `compile_kwargs`, add it
            compile_kwargs = self._get_compile_kwargs()
            build_params["compile_kwargs"] = compile_kwargs
        if has_param(final_build_fn, "params") or accepts_kwargs(final_build_fn):
            # build_fn accepts `params`, i.e. all of get_params()
            build_params["params"] = self.get_params()

        # build model
        if self._random_state is not None:
            with tensorflow_random_state(self._random_state):
                model = final_build_fn(**build_params)
        else:
            model = final_build_fn(**build_params)

        return model

    def _ensure_compiled_model(self) -> None:
        # compile model if user gave us an un-compiled model
        if not (hasattr(self.model_, "loss") and hasattr(self.model_, "optimizer")):
            kw = self._get_compile_kwargs()
            self.model_.compile(**kw)

    def _fit_keras_model(
        self,
        X: Union[np.ndarray, List[np.ndarray], Dict[str, np.ndarray]],
        y: Union[np.ndarray, List[np.ndarray], Dict[str, np.ndarray]],
        sample_weight: Union[np.ndarray, None],
        warm_start: bool,
        epochs: int,
        initial_epoch: int,
        **kwargs,
    ) -> None:
        """Fits the Keras model.

        Parameters
        ----------
        X : Union[np.ndarray, List[np.ndarray], Dict[str, np.ndarray]]
            Training samples, as accepted by tf.keras.Model
        y : Union[np.ndarray, List[np.ndarray], Dict[str, np.ndarray]]
            Target data, as accepted by tf.keras.Model
        sample_weight : Union[np.ndarray, None]
            Sample weights. Ignored by Keras if None.
        warm_start : bool
            If True, don't don't overwrite
            the history_ attribute and append to it instead.
        epochs : int
            Number of epochs for which the model will be trained.
        initial_epoch : int
            Epoch at which to begin training.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.

        Returns
        -------
        BaseWrapper
            A reference to the instance that can be chain called
            (ex: instance.fit(X,y).transform(X) )
        """

        # Make sure model has a loss function
        loss = self.model_.loss
        no_loss = False
        if isinstance(loss, list) and not any(
            callable(loss_) or isinstance(loss_, str) for loss_ in loss
        ):
            no_loss = True
        if isinstance(loss, dict) and not any(
            callable(loss_) or isinstance(loss_, str) for loss_ in loss.values()
        ):
            no_loss = True
        if no_loss:
            raise ValueError(
                "No valid loss function found."
                " You must provide a loss function to train."
                "\n\nTo resolve this issue, do one of the following:"
                "\n 1. Provide a loss function via the loss parameter."
                "\n 2. Compile your model with a loss function inside the"
                " model-building method."
                "\n\nSee https://www.adriangb.com/scikeras/stable/advanced.html#compilation-of-model"
                " for more information on compiling SciKeras models."
                "\n\nSee https://www.tensorflow.org/api_docs/python/tf/keras/losses"
                " for more information on Keras losses."
            )

        # collect parameters
        params = self.get_params()
        fit_args = route_params(params, destination="fit", pass_filter=self._fit_kwargs)
        fit_args["sample_weight"] = sample_weight
        fit_args["epochs"] = initial_epoch + epochs
        fit_args["initial_epoch"] = initial_epoch
        fit_args.update(kwargs)
        for bs_kwarg in ("batch_size", "validation_batch_size"):
            if bs_kwarg in fit_args:
                if fit_args[bs_kwarg] == -1:
                    try:
                        fit_args[bs_kwarg] = X.shape[0]
                    except AttributeError:
                        raise ValueError(
                            f"`{bs_kwarg}=-1` requires that `X` implement `shape`"
                        )
        fit_args = {k: v for k, v in fit_args.items() if not k.startswith("callbacks")}
        fit_args["callbacks"] = self._fit_callbacks

        if self._random_state is not None:
            with tensorflow_random_state(self._random_state):
                hist = self.model_.fit(x=X, y=y, **fit_args)
        else:
            hist = self.model_.fit(x=X, y=y, **fit_args)

        if not warm_start or not hasattr(self, "history_") or initial_epoch == 0:
            self.history_ = defaultdict(list)

        for key, val in hist.history.items():
            try:
                key = metric_name(key)
            except ValueError as e:
                # Keras puts keys like "val_accuracy" and "loss" and
                # "val_loss" in hist.history
                if "Unknown metric function" not in str(e):
                    raise e
            self.history_[key] += val

    def _check_model_compatibility(self, y: np.ndarray) -> None:
        """Checks that the model output number and y shape match.

        This is in place to avoid cryptic TF errors.
        """
        # check if this is a multi-output model
        if getattr(self, "n_outputs_expected_", None):
            # n_outputs_expected_ is generated by data transformers
            # we recognize the attribute but do not force it to be
            # generated
            if self.n_outputs_expected_ != len(self.model_.outputs):
                raise ValueError(
                    "Detected a Keras model input of size"
                    f" {self.n_outputs_expected_ }, but {self.model_} has"
                    f" {len(self.model_.outputs)} outputs"
                )
        # check that if the user gave us a loss function it ended up in
        # the actual model
        init_params = inspect.signature(self.__init__).parameters
        if "loss" in init_params:
            default_val = init_params["loss"].default
            if all(
                isinstance(x, (str, losses_module.Loss, type))
                for x in [self.loss, self.model_.loss]
            ):  # filter out loss list/dicts/etc.
                if default_val is not None:
                    default_val = loss_name(default_val)
                given = loss_name(self.loss)
                got = loss_name(self.model_.loss)
                if given != default_val and got != given:
                    raise ValueError(
                        f"loss={self.loss} but model compiled with {self.model_.loss}."
                        " Data may not match loss function!"
                    )

    def _validate_data(
        self, X=None, y=None, reset: bool = False, y_numeric: bool = False
    ) -> Tuple[np.ndarray, Union[np.ndarray, None]]:
        """Validate input arrays and set or check their meta-parameters.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape \
           (n_samples, n_features)
            The input samples. If None, ``check_array`` is called on y and
            ``check_X_y`` is called otherwise.
        y : Union[array-like, sparse matrix, dataframe] of shape \
            (n_samples,), default=None
            The targets. If None, ``check_array`` is called on X and
            ``check_X_y`` is called otherwise.
        reset : bool, default=False
            If True, override all meta attributes.
            If False, verify that they haven't changed.
        y_numeric : bool, default = False
            If True, ensure y is a numeric dtype.
            If False, allow non-numeric y to pass through.

        Returns
        -------
        Tuple[np.ndarray, Union[np.ndarray, None]]
            The validated input.
        """

        def _check_array_dtype(arr, force_numeric):
            if not isinstance(arr, np.ndarray):
                return _check_array_dtype(np.asarray(arr), force_numeric=force_numeric)
            elif (
                arr.dtype.kind not in ("O", "U", "S") or not force_numeric
            ):  # object, unicode or string
                # already numeric
                return None  # check_array won't do any casting with dtype=None
            else:
                # default to TFs backend float type
                # instead of float64 (sklearn's default)
                return tf.keras.backend.floatx()

        if X is not None and y is not None:
            X, y = check_X_y(
                X,
                y,
                allow_nd=True,  # allow X to have more than 2 dimensions
                multi_output=True,  # allow y to be 2D
                dtype=None,
                accept_sparse=True,
            )

        if y is not None:
            y = check_array(
                y,
                ensure_2d=False,
                allow_nd=False,
                dtype=_check_array_dtype(y, force_numeric=y_numeric),
            )
            y_dtype_ = y.dtype
            y_ndim_ = y.ndim
            if reset:
                self.target_type_ = self._type_of_target(y)
                self.y_dtype_ = y_dtype_
                self.y_ndim_ = y_ndim_
            else:
                if not np.can_cast(y_dtype_, self.y_dtype_):
                    raise ValueError(
                        f"Got y with dtype {y_dtype_},"
                        f" but this {self.__name__} expected {self.y_dtype_}"
                        f" and casting from {y_dtype_} to {self.y_dtype_} is not safe!"
                    )
                if self.y_ndim_ != y_ndim_:
                    raise ValueError(
                        f"y has {y_ndim_} dimensions, but this {self.__name__}"
                        f" is expecting {self.y_ndim_} dimensions in y."
                    )
        if X is not None:
            X = check_array(
                X, allow_nd=True, dtype=_check_array_dtype(X, force_numeric=True)
            )
            X_dtype_ = X.dtype
            X_shape_ = X.shape
            n_features_in_ = X.shape[1]
            if reset:
                self.X_dtype_ = X_dtype_
                self.X_shape_ = X_shape_
                self.n_features_in_ = n_features_in_
            else:
                if not np.can_cast(X_dtype_, self.X_dtype_):
                    raise ValueError(
                        f"Got X with dtype {X_dtype_},"
                        f" but this {self.__name__} expected {self.X_dtype_}"
                        f" and casting from {X_dtype_} to {self.X_dtype_} is not safe!"
                    )
                if len(X_shape_) != len(self.X_shape_):
                    raise ValueError(
                        f"X has {len(X_shape_)} dimensions, but this {self.__name__}"
                        f" is expecting {len(self.X_shape_)} dimensions in X."
                    )
                if X_shape_[1:] != self.X_shape_[1:]:
                    raise ValueError(
                        f"X has shape {X_shape_[1:]}, but this {self.__name__}"
                        f" is expecting X of shape {self.X_shape_[1:]}"
                    )
        return X, y

    def _type_of_target(self, y: np.ndarray) -> str:
        return type_of_target(y)

    @property
    def target_encoder(self):
        """Retrieve a transformer for targets / y.

        Metadata will be collected from ``get_metadata`` if
        the transformer implements that method.
        Override this method to implement a custom data transformer
        for the target.

        Returns
        -------
        target_encoder
            Transformer implementing the sklearn transformer
            interface.
        """
        return FunctionTransformer()

    @property
    def feature_encoder(self):
        """Retrieve a transformer for features / X.

        Metadata will be collected from ``get_metadata`` if
        the transformer implements that method.
        Override this method to implement a custom data transformer
        for the features.

        Returns
        -------
        sklearn transformer
            Transformer implementing the sklearn transformer
            interface.
        """
        return FunctionTransformer()

    def fit(self, X, y, sample_weight=None, **kwargs) -> "BaseWrapper":
        """Constructs a new model with ``model`` & fit the model to ``(X, y)``.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples, where n_samples is the number of samples
            and n_features is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.

        Warnings
        --------
            Passing estimator parameters as keyword arguments (aka as ``**kwargs``) to ``fit`` is not supported by the Scikit-Learn API,
            and will be removed in a future version of SciKeras.
            These parameters can also be specified by prefixing ``fit__`` to a parameter at initialization
            (``BaseWrapper(..., fit__batch_size=32, predict__batch_size=1000)``)
            or by using ``set_params`` (``est.set_params(fit__batch_size=32, predict__batch_size=1000)``).

        Returns
        -------
        BaseWrapper
            A reference to the instance that can be chain called (``est.fit(X,y).transform(X)``).
        """
        # epochs via kwargs > fit__epochs > epochs
        kwargs["epochs"] = kwargs.get(
            "epochs", getattr(self, "fit__epochs", self.epochs)
        )
        kwargs["initial_epoch"] = kwargs.get("initial_epoch", 0)

        self._fit(
            X=X,
            y=y,
            sample_weight=sample_weight,
            warm_start=self.warm_start,
            **kwargs,
        )

        return self

    @property
    def initialized_(self) -> bool:
        """Checks if the estimator is intialized.

        Returns
        -------
        bool
            True if the estimator is initialized (i.e., it can
            be used for inference or is ready to train),
            otherwise False.
        """
        return hasattr(self, "model_")

    def _initialize_callbacks(self) -> None:
        params = self.get_params()

        def initialize(destination: str):
            if params.get(destination) is not None:
                callback_kwargs = route_params(
                    params, destination=destination, pass_filter=set()
                )
                callbacks = unflatten_params(
                    items=params[destination], params=callback_kwargs
                )
                if isinstance(callbacks, Mapping):
                    # Keras does not officially support dicts, convert to a list
                    callbacks = list(callbacks.values())
                elif isinstance(callbacks, tf.keras.callbacks.Callback):
                    # a single instance, not officially supported so wrap in a list
                    callbacks = [callbacks]
                err = False
                if not isinstance(callbacks, List):
                    err = True
                for cb in callbacks:
                    if isinstance(cb, List):
                        for nested_cb in cb:
                            if not isinstance(nested_cb, tf.keras.callbacks.Callback):
                                err = True
                    elif not isinstance(cb, tf.keras.callbacks.Callback):
                        err = True
                if err:
                    raise TypeError(
                        "If specified, ``callbacks`` must be one of:"
                        "\n - A dict of string keys with callbacks or lists of callbacks as values"
                        "\n - A list of callbacks or lists of callbacks"
                        "\n - A single callback"
                        "\nWhere each callback can be a instance of `tf.keras.callbacks.Callback` or a sublass of it to be compiled by SciKeras"
                    )
            else:
                callbacks = []
            return callbacks

        all_callbacks = initialize("callbacks")
        self._fit_callbacks = all_callbacks + initialize("fit__callbacks")
        self._predict_callbacks = all_callbacks + initialize("predict__callbacks")

    def _initialize(
        self, X: np.ndarray, y: Union[np.ndarray, None] = None
    ) -> Tuple[np.ndarray, np.ndarray]:

        # Handle random state
        if isinstance(self.random_state, np.random.RandomState):
            # Keras needs an integer
            # we sample an integer and use that as a seed
            # Given the same RandomState, the seed will always be
            # the same, thus giving reproducible results
            state = self.random_state.get_state()
            r = np.random.RandomState()
            r.set_state(state)
            self._random_state = r.randint(low=1)
        else:
            # int or None
            self._random_state = self.random_state

        X, y = self._validate_data(X, y, reset=True)

        self.target_encoder_ = self.target_encoder.fit(y)
        target_metadata = getattr(self.target_encoder_, "get_metadata", dict)()
        vars(self).update(**target_metadata)
        self.feature_encoder_ = self.feature_encoder.fit(X)
        feature_meta = getattr(self.feature_encoder, "get_metadata", dict)()
        vars(self).update(**feature_meta)

        self.model_ = self._build_keras_model()
        self._initialize_callbacks()

        return X, y

    def initialize(self, X, y=None) -> "BaseWrapper":
        """Initialize the model without any fitting.

        You only need to call this model if you explicitly do not want to do any fitting
        (for example with a pretrained model). You should _not_ call this
        right before calling ``fit``, calling ``fit`` will do this automatically.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
                Training samples where n_samples is the number of samples
                and `n_features` is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape \
            (n_samples,) or (n_samples, n_outputs), default None
            True labels for X.

        Returns
        -------
        BaseWrapper
            A reference to the BaseWrapper instance for chained calling.
        """
        self._initialize(X, y)
        return self  # to allow chained calls like initialize(...).predict(...)

    def _fit(
        self,
        X,
        y,
        sample_weight,
        warm_start: bool,
        epochs: int,
        initial_epoch: int,
        **kwargs,
    ) -> None:
        """Constructs a new model with ``model`` & fit the model to ``(X, y)``.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
                Training samples where `n_samples` is the number of samples
                and `n_features` is the number of features.
        y :Union[array-like, sparse matrix, dataframe] of shape (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.
        warm_start : bool
            If True, don't rebuild the model.
        epochs : int
            Number of passes over the entire dataset for which to train the
            model.
        initial_epoch : int
            Epoch at which to begin training.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.
        """
        # Data checks
        if not ((self.warm_start or warm_start) and self.initialized_):
            X, y = self._initialize(X, y)
        else:
            X, y = self._validate_data(X, y)
        self._ensure_compiled_model()

        if sample_weight is not None:
            X, y, sample_weight = self._validate_sample_weight(X, y, sample_weight)

        y = self.target_encoder_.transform(y)
        X = self.feature_encoder_.transform(X)

        self._check_model_compatibility(y)

        self._fit_keras_model(
            X,
            y,
            sample_weight=sample_weight,
            warm_start=warm_start,
            epochs=epochs,
            initial_epoch=initial_epoch,
            **kwargs,
        )

    def partial_fit(self, X, y, sample_weight=None, **kwargs) -> "BaseWrapper":
        """Fit the estimator for a single epoch, preserving the current
        training history and model parameters.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples where n_samples is the number of samples
            and n_features is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape \
            (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.

        Returns
        -------
        BaseWrapper
            A reference to the instance that can be chain called
            (ex: instance.partial_fit(X, y).transform(X) )
        """
        if "epochs" in kwargs:
            raise TypeError(
                "Invalid argument `epochs` to `partial_fit`: `partial_fit` always trains for 1 epoch"
            )
        if "initial_epoch" in kwargs:
            raise TypeError(
                "Invalid argument `initial_epoch` to `partial_fit`: `partial_fit` always trains for from the current epoch"
            )

        self._fit(
            X,
            y,
            sample_weight=sample_weight,
            warm_start=True,
            epochs=1,
            initial_epoch=self.current_epoch,
            **kwargs,
        )
        return self

    def _predict_raw(self, X, **kwargs):
        """Obtain raw predictions from Keras Model.

        For classification, this corresponds to predict_proba.
        For regression, this corresponds to predict.
        """
        # check if fitted
        if not self.initialized_:
            raise NotFittedError(
                "Estimator needs to be fit before `predict` " "can be called"
            )
        # basic input checks
        X, _ = self._validate_data(X=X, y=None)

        # pre process X
        X = self.feature_encoder_.transform(X)

        # filter kwargs and get attributes for predict
        params = self.get_params()
        pred_args = route_params(
            params, destination="predict", pass_filter=self._predict_kwargs, strict=True
        )
        pred_args = {
            k: v for k, v in pred_args.items() if not k.startswith("callbacks")
        }
        pred_args["callbacks"] = self._predict_callbacks
        pred_args.update(kwargs)
        if "batch_size" in pred_args:
            if pred_args["batch_size"] == -1:
                try:
                    pred_args["batch_size"] = X.shape[0]
                except AttributeError:
                    raise ValueError(
                        "`batch_size=-1` requires that `X` implement `shape`"
                    )

        # predict with Keras model
        y_pred = self.model_.predict(x=X, **pred_args)

        return y_pred

    def predict(self, X, **kwargs):
        """Returns predictions for the given test data.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples where n_samples is the number of samples
            and n_features is the number of features.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.predict``.

        Warnings
        --------
            Passing estimator parameters as keyword arguments (aka as ``**kwargs``) to ``predict`` is not supported by the Scikit-Learn API,
            and will be removed in a future version of SciKeras.
            These parameters can also be specified by prefixing ``predict__`` to a parameter at initialization
            (``BaseWrapper(..., fit__batch_size=32, predict__batch_size=1000)``)
            or by using ``set_params`` (``est.set_params(fit__batch_size=32, predict__batch_size=1000)``).

        Returns
        -------
        array-like
            Predictions, of shape shape (n_samples,) or (n_samples, n_outputs).
        """
        # predict with Keras model
        y_pred = self._predict_raw(X=X, **kwargs)

        # post process y
        y_pred = self.target_encoder_.inverse_transform(y_pred)

        return y_pred

    @staticmethod
    def scorer(y_true, y_pred, **kwargs) -> float:
        """Scoring function for model.

        This is not implemented in BaseWrapper, it exists
        as a stub for documentation.

        Parameters
        ----------
        y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
            True labels.
        y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
            Predicted labels.
        **kwargs: dict
            Extra parameters passed to the scorer.

        Returns
        -------
        float
            Score for the test data set.
        """
        raise NotImplementedError("Scoring is not implemented on BaseWrapper.")

    def score(self, X, y, sample_weight=None) -> float:
        """Returns the score on the given test data and labels.

        No default scoring function is implemented in BaseWrapper,
        you must subclass and implement one.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Test input samples, where n_samples is the number of samples
            and n_features is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape \
            (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.

        Returns
        -------
        float
            Score for the test data set.
        """
        # validate y
        _, y = self._validate_data(X=None, y=y)

        # validate sample weights
        if sample_weight is not None:
            X, y, sample_weight = self._validate_sample_weight(
                X=X, y=y, sample_weight=sample_weight
            )

        # compute Keras model score
        y_pred = self.predict(X)

        # filter kwargs and get attributes for score
        params = self.get_params()
        score_args = route_params(params, destination="score", pass_filter=set())

        return self.scorer(y, y_pred, sample_weight=sample_weight, **score_args)

    def _get_metadata(self) -> Dict[str, Any]:
        """Meta parameters (parameters created by fit, like
        n_features_in_ or target_type_).

        Returns
        -------
        Dict[str, Any]
            Dictionary of meta parameters
        """
        return {
            k: v
            for k, v in self.__dict__.items()
            if (len(k) > 1 and k[-1] == "_" and k[-2] != "_" and k[0] != "_")
        }

    def set_params(self, **params) -> "BaseWrapper":
        """Set the parameters of this estimator.

        The method works on simple estimators as well as on nested objects
        (such as pipelines). The latter have parameters of the form
        ``<component>__<parameter>`` so that it's possible to update each
        component of a nested object.
        This also supports routed parameters, eg: ``classifier__optimizer__learning_rate``.

        Parameters
        ----------
        **params : dict
            Estimator parameters.

        Returns
        -------
        BaseWrapper
            Estimator instance.
        """
        for param, value in params.items():
            if any(
                param.startswith(prefix + "__") for prefix in self._routing_prefixes
            ):
                # routed param
                setattr(self, param, value)
            else:
                try:
                    super().set_params(**{param: value})
                except ValueError:
                    # Give a SciKeras specific user message to aid
                    # in moving from the Keras wrappers
                    raise ValueError(
                        f"Invalid parameter {param} for estimator {self.__name__}."
                        "\nThis issue can likely be resolved by setting this parameter"
                        f" in the {self.__name__} constructor:"
                        f"\n`{self.__name__}({param}={value})`"
                        "\nCheck the list of available parameters with"
                        " `estimator.get_params().keys()`"
                    ) from None
        return self

    def _get_param_names(self):
        """Get parameter names for the estimator"""
        return (
            k for k in self.__dict__ if not k.endswith("_") and not k.startswith("_")
        )

    def _more_tags(self):
        """Get sklearn tags for the estimator"""
        tags = super()._more_tags()
        tags.update(self._tags)
        return tags

    def __repr__(self):
        repr_ = str(self.__name__)
        repr_ += "("
        params = self.get_params()
        if params:
            repr_ += "\n"
        for key, val in params.items():
            repr_ += "\t" + key + "=" + str(val) + "\n"
        repr_ += ")"
        return repr_


class KerasClassifier(BaseWrapper, ClassifierMixin):
    """Implementation of the scikit-learn classifier API for Keras.

    Below are a list of SciKeras specific parameters. For details on other parameters,
    please see the see the `tf.keras.Model documentation <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`_.

    Parameters
    ----------
    model : Union[None, Callable[..., tf.keras.Model], tf.keras.Model], default None
        Used to build the Keras Model. When called,
        must return a compiled instance of a Keras Model
        to be used by `fit`, `predict`, etc.
        If None, you must implement ``_keras_build_fn``.
    optimizer : Union[str, tf.keras.optimizers.Optimizer, Type[tf.keras.optimizers.Optimizer]], default "rmsprop"
        This can be a string for Keras' built in optimizers,
        an instance of tf.keras.optimizers.Optimizer
        or a class inheriting from tf.keras.optimizers.Optimizer.
        Only strings and classes support parameter routing.
    loss : Union[Union[str, tf.keras.losses.Loss, Type[tf.keras.losses.Loss], Callable], None], default None
        The loss function to use for training.
        This can be a string for Keras' built in losses,
        an instance of tf.keras.losses.Loss
        or a class inheriting from tf.keras.losses.Loss .
        Only strings and classes support parameter routing.
    random_state : Union[int, np.random.RandomState, None], default None
        Set the Tensorflow random number generators to a
        reproducible deterministic state using this seed.
        Pass an int for reproducible results across multiple
        function calls.
    warm_start : bool, default False
        If True, subsequent calls to fit will _not_ reset
        the model parameters but *will* reset the epoch to zero.
        If False, subsequent fit calls will reset the entire model.
        This has no impact on partial_fit, which always trains
        for a single epoch starting from the current epoch.
    batch_size : Union[int, None], default None
        Number of samples per gradient update.
        This will be applied to both `fit` and `predict`. To specify different numbers,
        pass `fit__batch_size=32` and `predict__batch_size=1000` (for example).
        To auto-adjust the batch size to use all samples, pass `batch_size=-1`.
    class_weight : Union[Dict[Any, float], str, None], default None
        Weights associated with classes in the form ``{class_label: weight}``.
        If not given, all classes are supposed to have weight one.
        The "balanced" mode uses the values of y to automatically adjust
        weights inversely proportional to class frequencies in the input data
        as ``n_samples / (n_classes * np.bincount(y))``.
        Note that these weights will be multiplied with sample_weight (passed
        through the fit method) if sample_weight is specified.

    Attributes
    ----------
    model_ : tf.keras.Model
        The instantiated and compiled Keras Model. For pre-built models, this
        will just be a reference to the passed Model instance.
    history_ : Dict[str, List[Any]]
        Dictionary of the format ``{metric_str_name: [epoch_0_data, epoch_1_data, ..., epoch_n_data]}``.
    initialized_ : bool
        True if this estimator has been initialized (i.e. predict can be called upon it).
        Note that this does not guarantee that the model is "fitted": if ``BaseWrapper.initialize``
        was called instead of fit the model wil likely have random weights.
    target_encoder_ : sklearn-transformer
        Transformer used to pre/post process the target y.
    feature_encoder_ : sklearn-transformer
        Transformer used to pre/post process the features/input X.
    n_outputs_expected_ : int
        The number of outputs the Keras Model is expected to have, as determined by ``target_transformer_``.
    target_type_ : str
        One of:

        * 'continuous': y is an array-like of floats that are not all
          integers, and is 1d or a column vector.
        * 'continuous-multioutput': y is a 2d array of floats that are
          not all integers, and both dimensions are of size > 1.
        * 'binary': y contains <= 2 discrete values and is 1d or a column
          vector.
        * 'multiclass': y contains more than two discrete values, is not a
          sequence of sequences, and is 1d or a column vector.
        * 'multiclass-multioutput': y is a 2d array that contains more
          than two discrete values, is not a sequence of sequences, and both
          dimensions are of size > 1.
        * 'multilabel-indicator': y is a label indicator matrix, an array
          of two dimensions with at least two columns, and at most 2 unique
          values.
        * 'unknown': y is array-like but none of the above, such as a 3d
          array, sequence of sequences, or an array of non-sequence objects.
    y_shape_ : Tuple[int]
        Shape of the target y that the estimator was fitted on.
    y_dtype_ : np.dtype
        Dtype of the target y that the estimator was fitted on.
    X_shape_ : Tuple[int]
        Shape of the input X that the estimator was fitted on.
    X_dtype_ : np.dtype
        Dtype of the input X that the estimator was fitted on.
    n_features_in_ : int
        The number of features seen during `fit`.
    n_outputs_ : int
        Dimensions of y that the transformer was trained on.
    n_outputs_expected_ : int
        Number of outputs the Keras Model is expected to have.
    classes_ : Iterable
        The classes seen during `fit`.
    n_classes_ : int
        The number of classes seen during `fit`.
    """

    _estimator_type = "classifier"
    _tags = {
        "multilabel": True,
        "_xfail_checks": {
            "check_fit_idempotent": "tf does not use \
            sparse tensors",
            "check_no_attributes_set_in_init": "can only \
            pass if all params are hardcoded in __init__",
        },
        **BaseWrapper._tags,
    }

    def __init__(
        self,
        model: Union[None, Callable[..., tf.keras.Model], tf.keras.Model] = None,
        *,
        build_fn: Union[
            None, Callable[..., tf.keras.Model], tf.keras.Model
        ] = None,  # for backwards compatibility
        warm_start: bool = False,
        random_state: Union[int, np.random.RandomState, None] = None,
        optimizer: Union[
            str, tf.keras.optimizers.Optimizer, Type[tf.keras.optimizers.Optimizer]
        ] = "rmsprop",
        loss: Union[
            Union[str, tf.keras.losses.Loss, Type[tf.keras.losses.Loss], Callable], None
        ] = None,
        metrics: Union[
            List[
                Union[
                    str,
                    tf.keras.metrics.Metric,
                    Type[tf.keras.metrics.Metric],
                    Callable,
                ]
            ],
            None,
        ] = None,
        batch_size: Union[int, None] = None,
        validation_batch_size: Union[int, None] = None,
        verbose: int = 1,
        callbacks: Union[
            List[Union[tf.keras.callbacks.Callback, Type[tf.keras.callbacks.Callback]]],
            None,
        ] = None,
        validation_split: float = 0.0,
        shuffle: bool = True,
        run_eagerly: bool = False,
        epochs: int = 1,
        class_weight: Union[Dict[Any, float], str, None] = None,
        **kwargs,
    ):
        super().__init__(
            model=model,
            build_fn=build_fn,
            warm_start=warm_start,
            random_state=random_state,
            optimizer=optimizer,
            loss=loss,
            metrics=metrics,
            batch_size=batch_size,
            validation_batch_size=validation_batch_size,
            verbose=verbose,
            callbacks=callbacks,
            validation_split=validation_split,
            shuffle=shuffle,
            run_eagerly=run_eagerly,
            epochs=epochs,
            **kwargs,
        )
        self.class_weight = class_weight

    def _type_of_target(self, y: np.ndarray) -> str:
        target_type = type_of_target(y)
        if target_type == "binary" and self.classes_ is not None:
            # check that this is not a multiclass problem missing categories
            target_type = type_of_target(self.classes_)
        return target_type

    @property
    def _fit_kwargs(self) -> Set[str]:
        # remove class_weight since KerasClassifier re-processes it into sample_weight
        return BaseWrapper._fit_kwargs - {"class_weight"}

    @staticmethod
    def scorer(y_true, y_pred, **kwargs) -> float:
        """Scoring function for KerasClassifier.

        KerasClassifier uses ``sklearn_accuracy_score`` by default.
        To change this, override this method.

        Parameters
        ----------
        y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
            True labels.
        y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
            Predicted labels.
        **kwargs: dict
            Extra parameters passed to the scorer.

        Returns
        -------
        float
            Score for the test data set.
        """
        return sklearn_accuracy_score(y_true, y_pred, **kwargs)

    @property
    def target_encoder(self):
        """Retrieve a transformer for targets / y.

        For ``KerasClassifier.predict_proba`` to
        work, this transformer must accept a ``return_proba``
        argument in ``inverse_transform`` with a default value
        of False.

        Metadata will be collected from ``get_metadata`` if
        the transformer implements that method.
        Override this method to implement a custom data transformer
        for the target.

        Returns
        -------
        sklearn-transformer
            Transformer implementing the sklearn transformer
            interface.
        """
        categories = "auto" if self.classes_ is None else [self.classes_]
        return ClassifierLabelEncoder(loss=self.loss, categories=categories)

    def initialize(self, X, y) -> "KerasClassifier":
        """Initialize the model without any fitting.
        You only need to call this model if you explicitly do not want to do any fitting
        (for example with a pretrained model). You should _not_ call this
        right before calling ``fit``, calling ``fit`` will do this automatically.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
                Training samples where n_samples is the number of samples
                and `n_features` is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape \
            (n_samples,) or (n_samples, n_outputs), default None
            True labels for X.

        Returns
        -------
        KerasClassifier
            A reference to the KerasClassifier instance for chained calling.
        """
        self.classes_ = None
        super().initialize(X=X, y=y)
        return self

    def fit(self, X, y, sample_weight=None, **kwargs) -> "KerasClassifier":
        """Constructs a new classifier with ``model`` & fit the model to ``(X, y)``.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples, where n_samples is the number of samples
            and n_features is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.

        Warnings
        --------
            Passing estimator parameters as keyword arguments (aka as ``**kwargs``) to ``fit`` is not supported by the Scikit-Learn API,
            and will be removed in a future version of SciKeras.
            These parameters can also be specified by prefixing ``fit__`` to a parameter at initialization
            (``KerasClassifier(..., fit__batch_size=32, predict__batch_size=1000)``)
            or by using ``set_params`` (``est.set_params(fit__batch_size=32, predict__batch_size=1000)``).

        Returns
        -------
        KerasClassifier
            A reference to the instance that can be chain called (``est.fit(X,y).transform(X)``).
        """
        self.classes_ = None
        if self.class_weight is not None:
            sample_weight = 1 if sample_weight is None else sample_weight
            sample_weight *= compute_sample_weight(class_weight=self.class_weight, y=y)
        super().fit(X=X, y=y, sample_weight=sample_weight, **kwargs)
        return self

    def partial_fit(
        self, X, y, classes=None, sample_weight=None, **kwargs
    ) -> "KerasClassifier":
        """Fit classifier for a single epoch, preserving the current epoch
        and all model parameters and state.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples, where n_samples is the number of samples
            and n_features is the number of features.
        y : Union[array-like, sparse matrix, dataframe] of shape (n_samples,) or (n_samples, n_outputs)
            True labels for X.
        classes: ndarray of shape (n_classes,), default=None
            Classes across all calls to partial_fit. Can be obtained by via
            np.unique(y_all), where y_all is the target vector of the entire dataset.
            This argument is only needed for the first call to partial_fit and can be
            omitted in the subsequent calls. Note that y doesn’t need to contain
            all labels in classes. If you do not pass this argument, SciKeras
            will use ``classes=np.all(y)`` with the y passed in the first call.
        sample_weight : array-like of shape (n_samples,), default=None
            Array of weights that are assigned to individual samples.
            If not provided, then each sample is given unit weight.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.fit``.

        Returns
        -------
        KerasClassifier
            A reference to the instance that can be chain called
            (ex: instance.fit(X,y).transform(X) )
        """
        self.classes_ = (
            classes if classes is not None else getattr(self, "classes_", None)
        )
        if self.class_weight is not None:
            sample_weight = 1 if sample_weight is None else sample_weight
            sample_weight *= compute_sample_weight(class_weight=self.class_weight, y=y)
        super().partial_fit(X, y, sample_weight=sample_weight, **kwargs)
        return self

    def predict_proba(self, X, **kwargs):
        """Returns class probability estimates for the given test data.

        Parameters
        ----------
        X : Union[array-like, sparse matrix, dataframe] of shape (n_samples, n_features)
            Training samples, where n_samples is the number of samples
            and n_features is the number of features.
        **kwargs : Dict[str, Any]
            Extra arguments to route to ``Model.predict``.

        Warnings
        --------
            Passing estimator parameters as keyword arguments (aka as ``**kwargs``) to ``predict_proba`` is not supported by the Scikit-Learn API,
            and will be removed in a future version of SciKeras.
            These parameters can also be specified by prefixing ``predict__`` to a parameter at initialization
            (``KerasClassifier(..., fit__batch_size=32, predict__batch_size=1000)``)
            or by using ``set_params`` (``est.set_params(fit__batch_size=32, predict__batch_size=1000)``).

        Returns
        -------
        array-like, shape (n_samples, n_outputs)
            Class probability estimates.
            In the case of binary classification,
            to match the scikit-learn API,
            SciKeras will return an array of shape (n_samples, 2)
            (instead of `(n_sample, 1)` as in Keras).
        """
        # call the Keras model's predict
        outputs = self._predict_raw(X=X, **kwargs)

        # post process y
        y = self.target_encoder_.inverse_transform(outputs, return_proba=True)

        return y

@adriangb
Copy link
Owner Author

So you just added accept_sparse=True to check_X_y, right?

@mattalhonte-srm
Copy link

Right!

@adriangb
Copy link
Owner Author

That is really surprising. If you look at the code in the this PR we are only calling 3 functions:

  1. isspmatrix
  2. Xs.getformat()
  3. Xs_csr.sort_indices()

You uncommented (3) which leaves just the other 2. (1) is just an is instance check and (2) checks if the indices are sorted before sorting them

The only thing I can think of is that your indices are not sorted but TensorFlow knows how to handle that anyways despite what their documentation says and despite the suggestion in the SciPy docs.

I guess the things we can do here are:

  1. Test/check what happens if we give TF unsorted indices
  2. I'll make a branch that you can test that does not sort indices

@adriangb
Copy link
Owner Author

@mattalhonte-srm could you please try pip install git+https://github.com/adriangb/scikeras.git@sanity-check-sort-indices

@mattalhonte-srm
Copy link

Heya! That branch worked!

@mattalhonte-srm
Copy link

Thanks so much, this rules!

@mattalhonte-srm
Copy link

Heya! Thanks again for this, it's been working perfectly! Could it be merged into Master? I wouldn't wanna miss out on new versions of the package!

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants